SALSA: Sequence ALignment via Steiner Ancestors
نویسنده
چکیده
We describe SALSA (Sequence ALignment via Steiner Ancestors), a public{domain suite of programs for generating multiple alignments of a set of genomic sequences. We allow the use of either of the two popular objectives, Tree Alignment or Sum-of-Pairs. The main distinguishing feature of our method is that the alignment is obtained via a tree in which the internal nodes (ancestors) are labeled by Steiner sequences for triples of the input sequences. Given lists of candidate labels for the ancestral sequences, we use dynamic programming to choose an optimal labeling under either objective functions. Finally, the fully labeled tree of sequences is turned into into a multiple alignment. Enhancements in our implementation include the traditional space-saving ideas of Hirschberg as well as new data-packing techniques. The running-time bottleneck of computing exact Steiner sequences is handled by a highly eeective but much faster heuristic alternative. Finally, other modules in the suite allow automatic generation of linear-program input les that can be used to compute novel lower bounds on the optimal values. We also report on some preliminary computational experiments with SALSA.
منابع مشابه
SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments
MOTIVATION Optimal sequence alignment based on the Smith-Waterman algorithm is usually too computationally demanding to be practical for searching large sequence databases. Heuristic programs like FASTA and BLAST have been developed which run much faster, but at the expense of sensitivity. RESULTS In an effort to approximate the sensitivity of an optimal alignment algorithm, a new algorithm h...
متن کاملSteiner Points in the Space of Genome Rearrangements
We present some experiences with the problem of multiple genome comparison , analogous to multiple sequence alignment in sequence comparison, under the inversion and transposition distance metrics, given a xed phylogeny. We rst describe a heuristic for the case in which phylogeny is a star on three vertices and then use this to approximate the multiple genome comparison problem via local search.
متن کاملAttacking Generalized Tree Alignment
Many multiple alignment methods implicitly or explicitly try to minimize the amount of biological change implied by an alignment. At the level of sequences, biological change is measured along a phylogenetic tree, a structure frequently being predicted only after the multiple alignment instead of together with it. The Generalized Tree Alignment problem addresses both questions simultaneously. I...
متن کاملRegular Language Constrained Sequence Alignment Revisited
Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n²t⁴) time and O(n²t²) space algorithm for solving it, where n is the length of the input strings...
متن کاملImplied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search.
A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007